5 research outputs found
Exploring the Fairness and Resource Distribution in an Apache Mesos Environment
Apache Mesos, a cluster-wide resource manager, is widely deployed in massive
scale at several Clouds and Data Centers. Mesos aims to provide high cluster
utilization via fine grained resource co-scheduling and resource fairness among
multiple users through Dominant Resource Fairness (DRF) based allocation. DRF
takes into account different resource types (CPU, Memory, Disk I/O) requested
by each application and determines the share of each cluster resource that
could be allocated to the applications. Mesos has adopted a two-level
scheduling policy: (1) DRF to allocate resources to competing frameworks and
(2) task level scheduling by each framework for the resources allocated during
the previous step. We have conducted experiments in a local Mesos cluster when
used with frameworks such as Apache Aurora, Marathon, and our own framework
Scylla, to study resource fairness and cluster utilization. Experimental
results show how informed decision regarding second level scheduling policy of
frameworks and attributes like offer holding period, offer refusal cycle and
task arrival rate can reduce unfair resource distribution. Bin-Packing
scheduling policy on Scylla with Marathon can reduce unfair allocation from
38\% to 3\%. By reducing unused free resources in offers we bring down the
unfairness from to 90\% to 28\%. We also show the effect of task arrival rate
to reduce the unfairness from 23\% to 7\%
Tromino: Demand and DRF Aware Multi-Tenant Queue Manager for Apache Mesos Cluster
Apache Mesos, a two-level resource scheduler, provides resource sharing
across multiple users in a multi-tenant cluster environment. Computational
resources (i.e., CPU, memory, disk, etc. ) are distributed according to the
Dominant Resource Fairness (DRF) policy. Mesos frameworks (users) receive
resources based on their current usage and are responsible for scheduling their
tasks within the allocation. We have observed that multiple frameworks can
cause fairness imbalance in a multiuser environment. For example, a greedy
framework consuming more than its fair share of resources can deny resource
fairness to others. The user with the least Dominant Share is considered first
by the DRF module to get its resource allocation. However, the default DRF
implementation, in Apache Mesos' Master allocation module, does not consider
the overall resource demands of the tasks in the queue for each user/framework.
This lack of awareness can result in users without any pending task receiving
more resource offers while users with a queue of pending tasks starve due to
their high dominant shares. We have developed a policy-driven queue manager,
Tromino, for an Apache Mesos cluster where tasks for individual frameworks can
be scheduled based on each framework's overall resource demands and current
resource consumption. Dominant Share and demand awareness of Tromino and
scheduling based on these attributes can reduce (1) the impact of unfairness
due to a framework specific configuration, and (2) unfair waiting time due to
higher resource demand in a pending task queue. In the best case, Tromino can
significantly reduce the average waiting time of a framework by using the
proposed Demand-DRF aware policy
Evaluation of Docker Containers for Scientific Workloads in the Cloud
The HPC community is actively researching and evaluating tools to support
execution of scientific applications in cloud-based environments. Among the
various technologies, containers have recently gained importance as they have
significantly better performance compared to full-scale virtualization, support
for microservices and DevOps, and work seamlessly with workflow and
orchestration tools. Docker is currently the leader in containerization
technology because it offers low overhead, flexibility, portability of
applications, and reproducibility. Singularity is another container solution
that is of interest as it is designed specifically for scientific applications.
It is important to conduct performance and feature analysis of the container
technologies to understand their applicability for each application and target
execution environment. This paper presents a (1) performance evaluation of
Docker and Singularity on bare metal nodes in the Chameleon cloud (2) mechanism
by which Docker containers can be mapped with InfiniBand hardware with RDMA
communication and (3) analysis of mapping elements of parallel workloads to the
containers for optimal resource management with container-ready orchestration
tools. Our experiments are targeted toward application developers so that they
can make informed decisions on choosing the container technologies and
approaches that are suitable for their HPC workloads on cloud infrastructure.
Our performance analysis shows that scientific workloads for both Docker and
Singularity based containers can achieve near-native performance. Singularity
is designed specifically for HPC workloads. However, Docker still has
advantages over Singularity for use in clouds as it provides overlay networking
and an intuitive way to run MPI applications with one container per rank for
fine-grained resources allocation